Focusing on Open-Source and Free Software
Open-source software refers to software that has been developed and supported by a user community.
Although open-source software has licenses, they are typically free but require you to adhere to certain
policies when using the software. In this section, we talk about the two most popular open-source
statistical software packages: R and Python.
Open-source software
The two most popular and extensive open-source statistical programs are R and Python.
R: R is statistical software that has been developed and is maintained by the R user community. It
has two interfaces: R GUI, which looks similar to PC SAS and SPSS, and RStudio, which is an
integrated development environment (IDE). Analysts prefer to use RStudio when developing
graphical displays for the web, while R GUI is fine for most statistical work. To run R, you
download and install the base application. Then, for specified functions not included in the base
application, you install additional R packages. Like with PC SAS, in R, you import or connect to
datasets, develop and save code files to run on those datasets, and produce output you can save.
Base R, R packages, and documentation are available on the Comprehensive R Archive Network
(CRAN) server at https://cran.r-project.org.
Python: Python is an open-source programming language that is often used to analyze data. As
with R, Python is developed and maintained by its own user community and runs in a similar way.
Although you still develop code that runs against datasets in the Python environment, the Python
and R code are different. Instead of packages as in R, Python has libraries. Python is available at
www.python.org/downloads.
Students often wonder what the differences are between R and Python, and which one to learn.
They are essentially the same, although scientific disciplines have leaned toward adopting R, and
engineering disciplines have leaned toward Python. Many students find themselves easily
learning both.
Other free statistical software
Other statistical software packages are free, but they are not technically open-source — meaning they
were not developed by an open-source community, and they are not licensed the same way.
Software that performs many functions
This section provides examples of free software that performs many functions like SAS and R.
OpenStat and LazStats are free statistical programs developed by Dr. Bill Miller that use menus
that resemble SPSS. Dr. Miller provides several excellent manuals and textbooks that support both
programs. OpenStat and LazStats are available at https://openstat.info.
Epi Info was developed by the United States Centers for Disease Control to acquire, manage,
analyze, and display the results of epidemiological research. What makes it different than other